Search CORE

159 research outputs found

Measuring efficiency in high-accuracy, broad-coverage statistical parsing

Author: Charniak Eugene
Roark Brian
Publication venue
Publication date: 01/01/2000
Field of study

Very little attention has been paid to the comparison of efficiency between high accuracy statistical parsers. This paper proposes one machine-independent metric that is general enough to allow comparisons across very different parsing architectures. This metric, which we call ``events considered'', measures the number of ``events'', however they are defined for a particular parser, for which a probability must be calculated, in order to find the parse. It is applicable to single-pass or multi-stage parsers. We discuss the advantages of the metric, and demonstrate its usefulness by using it to compare two parsers which differ in several fundamental ways.Comment: 8 pages, 4 figures, 2 table

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Explaining vowel inventory tendencies via simulation: finding a role for quantal locations and formant normalization

Author: Roark Brian
Publication venue: ScholarWorks@UMass Amherst
Publication date: 11/11/2020
Field of study

ScholarWorks@UMass Amherst

Disambiguatory Signals are Stronger in Word-initial Positions

Author: Cotterell Ryan
Pimentel Tiago
Roark Brian
Publication venue
Publication date: 03/02/2021
Field of study

Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture -- as in Wedel et al. (2019b), but common elsewhere -- that languages have evolved to provide more information earlier in words than later. Information-theoretic methods to establish such tendencies in lexicons have suffered from several methodological shortcomings that leave open the question of whether this high word-initial informativeness is actually a property of the lexicon or simply an artefact of the incremental nature of recognition. In this paper, we point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word, and present several new measures that avoid these confounds. When controlling for these confounds, we still find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.Comment: Accepted at EACL 2021. Code is available in https://github.com/tpimentelms/frontload-disambiguatio

arXiv.org e-Print Archive

Repository for Publications and Research Data

Recommended from our members

Rethinking Phonotactic Complexity

Author: Cotterell Ryan
Pimentel Tiago
Roark Brian
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2019
Field of study

ScholarWorks@UMass Amherst